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Abstract 

Background: As for other major crops, achieving a complete wheat genome sequence is essential for the application 
of genomics to breeding new and improved varieties. To overcome the complexities of the large, highly repetitive and 
hexaploid wheat genome, the International Wheat Genome Sequencing Consortium established a chromosome-based 
strategy that was validated by the construction of the physical map of chromosome 38. Here, we present improved 
strategies for the construction of highly integrated and ordered wheat physical maps, using chromosome 1 BL as a 
template, and illustrate their potential for evolutionary studies and map-based cloning. 

Results: Using a combination of novel high throughput marker assays and an assembly program, we developed a 
high quality physical map representing 93% of wheat chromosome IBL, anchored and ordered with 5,489 markers 
including 1,161 genes. Analysis of the gene space organization and evolution revealed that gene distribution and 
conservation along the chromosome results from the superimposition of the ancestral grass and recent wheat 
evolutionary patterns, leading to a peak of synteny in the central part of the chromosome arm and an increased 
density of non-collinear genes towards the telomere. With a density of about 1 1 markers per Mb, the IBL physical 
map provides 916 markers, including 193 genes, for fine mapping the 40 QTLs mapped on this chromosome. 

Conclusions: Here, we demonstrate that high marker density physical maps can be developed in complex 
genomes such as wheat to accelerate map-based cloning, gain new insights into genome evolution, and provide a 
foundation for reference sequencing. 

Keywords: chromosome IBL, evolution, gene space, grasses, hexaploid wheat, map-based cloning, physical map- 
ping, sequencing, synteny 
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Background 

Cereals crops, such as rice, maize, sorghum and wheat, 
are major caloric sources for humans and farm animals. 
While reference genome sequencesare available and are 
already supporting crop improvement in a challenging 
environment [l]for rice [2], sorghum[3]and maize [4], 
wheat genomics and its applicationis lagging behind. The 
wheat genome has always been viewed as impossible to 
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sequence because of the large amount of repetitive 
sequences (>80%) [5], gigantic size (17 gigabases (Gb)) 
and the level of ploidy of bread wheat {2n = 6x = 42). 
Even with the rapid developments in DNA sequencing 
technologies that enable the production of gigabases of 
sequence within a few days [6], the short read lengths 
offered by these techniques and the large amount of 
repeated sequences present in the wheat genome make 
de novo assembly of non-genic regions extremely difficult 
[7]. These difficulties can be circumvented by focusing 
only on the gene catalogue and ignoring the intergenic 
regions that mostly consist of transposable elements. 
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However, this practice is not justified in light of the 
results of whole genome functional analyses such as 
the characterization of 1% of the human genome in the 
ENCODE project [8] and association studies performed 
in maize [9] that clearly indicate the importance of inter- 
genie regions in the regulation of genome expression. 
Thus, a complete wheat genome sequence is needed to 
access the complete catalogue of genes and regulatory 
elements and to provide a framework for understanding 
the impact of genomic variation on phenotypes. While 
long read single molecule sequencing may in the future 
enable tacklingof large and complex genomes using only 
whole genome shotgun (WGS) sequencing, the only fea- 
sible approach at this time to obtain a complete reference 
genome sequence of bread wheat is bacterial artificial 
chromosome (BAG) by BAG sequencing based on the 
construction of robust physical maps. 

To reduce the complexity of physically mapping a 
17 Gb hexaploid genome containing more than 80% 
similar or identical sequences, the International Wheat 
Genome Sequencing Consortium (IWGSC) [10] has 
adopted a strategy based on the individual sorting and 
analysis of chromosome or chromosome arms by flow 
cytometry [11] to construct specific BAG libraries [12]. 
The first BAG library [13] was used successfully to estab- 
lish a chromosome landing-ready physical map of chro- 
mosome 3B, the largest wheat chromosome (1 Gb) [14]. 
This physical map has been used in several studies to 
analyze the composition and organization of the wheat 
gene space, provide estimates of the gene number, and 
determine the relative proportion of transposable ele- 
ments families in the wheat genome [5, 15, 16]. In contrast 
to early cytogenetic studies based on expressed sequence 
tag(EST) mapping suggesting that most of the genes are 
found in a few large, gene-rich regions [17], these analyses- 
revealed the presence of numerous small gene islands 
dispersed along the chromosome and no geneless region 
larger than 800 kilobases (kb). Moreover, access to physi- 
cal maps and sequences helped to refine coUinearityrela- 
tionships between wheat and the other grass genomesby 
providing a higher level of resolution than genetic or 
cytogenetic mapping[15,16,18].The strategy used to build 
the physical map of wheat chromosome SB was based on 
a high-information-content fingerprinting method [19] 
and FingerPrinted Gontigs (FPG) software [20,21] for the 
assemblies. It resulted in 1,036 contigs with an N50 of 
778 kb covering 82% of the chromosome [14]. To improve 
physical assembly in complex genomes, new software, 
called Linear Topological Gontig (LTG),has been devel- 
oped recently as an attractive alternative to FPG. It 
enables longer, better ordered and more robust contigs 
to be built compared to FPG contigs [22]. Physical maps 
are only useful when they are anchored to genetic maps 
and traits with markers. PGR methods used to anchor 



the physical map of chromosome 3B resulted in a marker 
density of 1.4 markers per megabase (Mb)and 56% of the 
physical map anchored. While useful for many map- 
based cloning projects, this marker density is far from 
that obtained in rice [23] or maize [24] (8 and 12 markers 
per Mb, respectively) and should be increased for breed- 
ing purposes. High throughput anchoring platforms [16] 
that increasethe number of genes anchored to the physi- 
cal maps have been developed in wheat recently but 
more anchoring resources and efforts are still needed. In 
addition to anchoring the physical map with markers, it 
is important to order the physical contigs along the chro- 
mosomes. Here, the wheat genome is again a challenge 
because of the uneven distribution and lack of recombi- 
nation in more than half of the chromosomes [25]. 

In this work, we used a combination ofnew high 
throughput genotyping assays and synteny with other 
grass genomes to establish a physical map of the wheat 
chromosome IBL with the highest marker density for a 
wheat physical map so far (11 markers per Mb), a high 
level of anchoring (74% in the deletion bins; 19% on the 
genetic map) and a good percentage (48%) of contigs 
ordered along the chromosome arm. This physical map 
allowed us to gain new insights into chromosome evolu- 
tion and refine estimates of physical sizes of deletion bins. 
Further, it provides a powerful tool for chromosome land- 
ing and for sequencing chromosome IBL in the near 
future. The new high throughput marker assays combined 
with the optimized assembly and ordering methodologies 
proposed here can be applied to other plant genomes with 
similar levels of redundancy and complexity. 

Results 

FingerPrinted Contigs and Linear Topological Contig 
assemblies of the IBL physical map 

A IBL-specific BAG library, containing 92,160 clones ori- 
ginating from sorted wheat chromosome IBL of Chinese 
Spring and representing 15.4x coverage of the arm [12], 
was fingerprinted with the SNaPshot technology. A total 
of65,413 high quality fingerprints (71%) wasobtained and 
used to build a physical map. A first automated assembly 
was performed with the FPC software [20,21] following 
the guidelines adopted by the IWGSC[26]. This resulted 
in an assembly of 43,523 fingerprints into 3,030 contigs 
representing 807 Mb(151% of chromosome IBL) with a 
N50 of 434 kb and a L50 of 391.A minimal tiling path 
(MTP) of 8,597 clones was designed and re-arrayed for 
further marker screening and analyses. Sixty-three 
-dimensional (plate, row and column) pools from the 
MTP and 240 plate pools from the whole IBL BAG 
library were produced.During the course of the project, a 
new software -LTC [22] -specifically developed to build 
physical maps in complex genomessuch as wheat,became 
available. To improve the assembly of the IBL physical 
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map for future sequencing, we performed an automated 
LTC assembly using the same 65,413 high quaUty finger- 
prints. It resulted in an assembly of 41,940 fingerprints 
(including 94.4% in common with the FPC assembly) 
into 694 contigs representing 502 Mb (94% of the chro- 
mosome arm) with a N50 value of 961kb and a L50 of 
162. The maximum contig size was of 5,800 kb in the 
LTC map, three times longer than thel,780 kb in the 
FPC. This improved LTCmap was used as a template for 
adding the marker and order information and for build- 
ing a final version of the map. 

A combination of high throughput approachesenables 
the construction of a 1BL physical map anchored with 
more 5,000 molecular markers 

At the beginning of the project, there were only 171 IBL- 
specific PCR markers (114 single sequence repeats (SSR) 
and 57 restriction fragment length polymorphisms 
(RFLP)) available publicly in the GrainGenes database [27]. 
Thus, to develop a high density integrated physical map of 
chromosome IBL, that is, a map comprising BAC contigs 
anchored to genetic and cytogenetic maps with a high 
number (> 1,000) of molecular markers, we developed new 
molecular markers and anchored them to the IBL physical 
contigs and genetic or cytogenetic maps. 

The new IBL markers were obtained from three differ- 
ent sources. The first source was2.17 million sequence 
reads obtained by the Roche 454 technology on amplified 
DNA from sorted chromosome IBL of Chinese Spring 
[28], used to develop insertion site-based polymorphism 
(ISBP) [29] and SSR markers. In total, 46,606 high confi- 
dence and non-redundant markersincluding 46,194 ISB- 
Pand 412 SSR markers were designed automatically by the 
IsbpFinder.pl [29] and ssrFinder.pl programs, respectively. 
From these, l,200ISBPs and 200 SSRswere randomly 
selected for direct PCR screening (see below) of the three- 
dimensional pools of the IBL MTP. To increase the 
throughput, we also developed a new platform for ISBP 
markers by designing an ISBP-lBL-specific NimbleGen 
microarray. To do this, sequences corresponding to junc- 
tions between a transposable element (TE) and a low-copy 
sequence were specifically selected among the 46,194 ISBP 
markers designed above. In total, 17,788 of such ISBP 
markers, including 193 in common with the 1,200 PCR- 
based ISBPs,were used to design a 17k IBL ISBP array 
that was then hybridized with the MTP pools (see below). 
Thus, 18,795 ISBP and 200 SSR markers designed from 
454 sequence reads of sorted IBL chromosome were used 
for screening the three-dimensional MTP pools.The sec- 
ond source of markers originated from hybridization of 
the three-dimensional MTP pools with the wheat Nimble- 
Gen 40k unigene microarray that was developed pre- 
viously by Rustenholz et [15]. Finally, 445 conserved 
orthologous set (COS) markers identified on chromosome 



group IL (lAL, IBL and IDL) from an ancestral set of 
10,000 grass COS markers defined by comparative studies 
between wheat, barley, rice, sorghum and Brachypodium 
[30] were used as a third source of markers for direct PCR 
screening of the MTP pools. Such COS markers should 
enable the IBL map to be linked directly to other grass 
genomes through the ancestral gene set. 

All molecular markers were then used for cytogenetic 
and genetic mapping on chromosome IBL. A total of 
1,611 PCR markers, including the newly developed 1,200 
ISBP and 200 SSR, the 171 publicly available PCR mar- 
kers, and 40 of the 445 COS corresponding to genes not 
present on the wheat NimbleGen 40k unigene microar- 
ray, werefirst checked for their specificity to wheat chro- 
mosomelBL using aneuploid lines from cv Chinese 
Spring,including a nuUisomic-lB-tetra-somic lA line, the 
IBL and IBS ditelosomic lines[31,32]. Out of the 1,611 
tested markers, 594(37%) were clearly IBL-specific. To 
obtain further indication of their position along the chro- 
mosome,the IBL-specific markers weretested on genomic 
DNA of eight deletion lines representing nine deletion 
bins. In total,549 markers (92%;475 ISBP, 67 SSR, 4 RFLP 
and 3 COS) were unambiguously assigned to one of the 
ninebins (Table 1). In addition, 84 markers (36 ISBPs and 
48 SSRs) showing polymorphism between Chinese Spring 
and Renan, the parents of a 381 F2 mapping population 
(CSReF2), were mapped genetically. Linkage analysis 
resulted in a genetic map of 124.6 centimorgan (cM)with 
an average of one marker every 1.5 cM. Ahigh marker- 
density was observed in the proximal part of the chromo- 
some, with 33 markers (39%) located in the first 10 cM of 
the map. Following the method used for chromosome 3B 
[14], we built a IBL neighbor genetic map using the Chi- 
nese Spring xRenanmap as a basis and sevenadditional 
genetic maps from fivebi-parental populations andtwo- 
consensus maps representing 13 populations (see Materi- 
als and methods). The IBL neighbor map consisted of 
478 markers including 223 SSRs, 97 genes, 80 RFLPs, 50 
ISBPs, 26 Diversity ArraysTechnology, 1 single nucleo- 
tide polymorphism and 1 protein marker(Figure IB). 
This represents a density of onemarker per 0.26 cM. 

To establish links between the physical, geneticand cyto- 
genetic maps, we first screened the 63 three-dimensional 
pools of the MTP and the 240 plate pools of the complete 
IBL BAC library with 465 IBL-specific PCR. Of these, 431 
markers amplified at least one MTP row, column and 
MTP plate pool and/or several plate pools. Deconvolution 
of the information was performed using a homemade Perl 
script (named DSMP.pl) for the MTP pools results and 
the Elephant (e/ectronic jE^/zysical map ^^^nchoring tool) [33] 
software for the plate pools results (see Materials and 
methods). This led to the assignation of 416PCR markers 
(320 ISBPs, 70 SSRs, 22 COSs and 4 RFLPs) to individual 
BACs (Additional file 1). The information was integrated 
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Table 1 Distribution of the number of markers, genes and physical contigs in 9 deletion bins along the centromere- 
telomere axis of chromosome 1 BL 

Markers assigned^ Contigs'' Contigs size*^ Markers'^ Genes^ Gene density^ Non-syntenic genes^ Genes in islands'^ 



Centromere 


3 


02.5 


02.03 


6 


2 


0.98 


2 


1 


C-lBLl 1-0.23 


115 


75.5 


93.15 


722 


168 


1.80 


99 


115 


IBLl 1-0.23-0.32 


37 


24.0 


35.67 


354 


77 


2.16 


52 


48 


1 BL6-0.32-0.47 


36 


27.0 


29.05 


247 


68 


2.34 


24 


37 


IBLl -0.47-0.61 


109 


76.0 


93.14 


1160 


232 


2.49 


114 


158 


IBLl 4-0.61 -0.69 


34 


24.0 


29.52 


311 


80 


2.71 


57 


65 


1 BL2-0.69-0.74 


79 


55.0 


53.71 


714 


155 


2.89 


102 


105 


1 BL8-0.74-0.85 


48 


34.0 


22.35 


418 


66 


2.95 


52 


48 


1 BL3-0.85-0.89 


51 


35.5 


23.88 


507 


81 


3.39 


47 


62 


1BL4-0.89-1.00 


37 


26.5 


15.10 


302 


57 


3.77 


40 


45 


Total 


549 


380 


397.6 


4741 


986 


2.48 


587 


683 



^ Number of markers assigned to the deletion bins; "^number of contigs assigned to the deletion bins; "^cumulated size of the assigned contigs in Mb; ^total 
number of markers in the assigned contigs; ^number of genes in the assigned contigs; ^the gene density in genes per Mb was estimated in each bin using the 
number of genes anchored divided by the cumulated contigs size; ^genes not conserved in syntenic regions of rice, Brachypodium or sorghum are indicated for 
each bin; '^the number of gene islands was calculated for each bin using the number of genes in the same BAC or overlapping BACs in each bin. 



into the final version of the physical map obtained by 
LTC, resulting in the anchoring of 241 contigs with the 
416 markers. 

The 40k unigene and 17k IBL-ISBP NimbleGen arrays 
were hybridized with the 63 three-dimensional MTP pools. 
After signal quantification, normalization (see Materials and 
methods) and data deconvolution, 3,912 ISBPs and 1,615 
unigenes were unambiguously assigned to individual BACs 
and subsequently to contigs of the LTC map. Ten ISBPs 
were randomly selected to control the accuracy of the assig- 
nation using microarray hybridizations.In nine cases, the 
putative positive poolswere confirmed,demonstrating the 
robustness of the methodology. To confirm the identity of 
the putative IBL genes identified by hybridization of the 
40K unigene array, we performed two additional controls: 
hybridization of genomic DNA from sorted chromosome 
IBL on the array, and identification of the IBL unigenes of 
the array by sequence similarity with the Roche454 
sequence reads obtained from sorted chromosome IBL. 
Using these data, 392 unigenes assigned to the physical con- 
tigs by hybridization of the three-dimensional MTP pools 
but not confirmed by the 454 reads or by hybridization of 
the sorted chromosome IBL genomic DNA were discarded 
to avoid any ambiguity.This resulted in 1,223 unigenes 
assigned with high confidence to the IBL physical contigs. 

To eliminate redundancy in the unigene set, weused 
information from orthologous genes in rice and Brachy- 
podium distachyon as well as the latest release of the 
wheat unigene build (version 59). This resulted in the 
elimination of 62 redundant unigenes from the IBL 
physical map, leading to a total of 1,161 unique genes 
unambiguously assigned to the IBL physical map. Thus, 
together with the 4,232 ISBPs, 70 SSR, 22 COS and4 
RFLP, thel,161 unigenes provide a IBL physical map 
anchored with 5,489 markers (Additional file 1). 



The marker information was then used to guide a man- 
ual assembly step for the physical map (see Materials and 
methods). This enabled themerging of 78 contigs,thereby 
resulting in a final IBL physical map of 616 contigs repre- 
senting 497 Mb (93% of the chromosome arm)with a N50 
of 1128 kb, a L50 of 142 and a density of markers of 11 
markers per Mb. To order the physical contigs of the IBL 
map along the chromosome, we identified the contigs car- 
rying the 543 markers located in the nine deletion bins as 
well as the 84 markers from the Renan x Chinese Spring 
genetic map. This enabled the placement of 380 of the 616 
contigs representing 74% of chromosome IBL (397.6 Mb), 
including 986 genes (84.9% of the 1,161 wheat IBL uni- 
genes),in thenine deletion bins, and 84 contigs represent- 
ing 19% of chromosome IBL (103 Mb) on the IBL 
neighbor genetic map (Figure lA; Additional file 2). Three 
contigs representing 2 Mb were assigned specifically to the 
centromere (Figure lA).The anchored physical map of 
wheat chromosome IBL is available from the Unite de 
Recherche Genomique Info website [34]. 

Physical to genetic map ratio analyses reveal 
discrepancies in the estimation of the deletion bin sizes 
between the physical and cytogenetic maps 

The anchored physical map was used to measure the ratio 
between physical and genetic distances and study the 
recombination pattern along chromosome IBL. We first 
estimated the size of the nine deletion binsusing the 
cumulative size of the physical contigs anchored in each 
bin corrected by the estimated percentage of chromosome 
coverage (74%). It ranged from 20 to 125 Mb (Figure 2). 
When compared to the estimations based on cytogenetic 
measurements[31,32,35],some striking differences were 
observed (Figure 2). While three deletion bins (C-lBLll- 
0.23, lBLll-0.23-0.32 and 1BL14-0.61-0.69) had very 
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Figure 1 Representation of the integrated physical and genetic map and distribution of recombination rate along wheat chromosome 

1BL (A) Representation of the IBL deletion bin map. The centromere is represented as a grey circle and the nine deletion bins are represented 
by colored boxes as follows: C-lBLl 1-0.23 deletion bin in purple, 1 BLl 1-0.23-0.32 in pink, the 1 BL6-0.32-0.47 in blue, 1 BLl -0.47-0.61 in light blue, 
IBL 14-0.6 1-0.69 in green, 1 BL2-0.69-0.74 in light green, 1 BL8-0.74-0.85 in yellow, 1 BL3-0.85-0.89 in red andl BL4-0.89-1 .00 in dark red. The number 
of physical contigs assigned to a bin and the cumulative size of these contigs are indicated. When contigs carried BACs that were assigned to 
two different consecutive bins indicating that they likely are at the junction between the bins, the contig was counted for 0.5 in each bin. 
(B) Representation of the IBL neighbor genetic map. The map is divided into segments corresponding to the deletion bins except for deletion 
bins 1 BLl 1-0.23-0.32 and 1 BL6-0.32-0.47 that were merged. (C)Representation of the ratio between the genetic and the physical distances along 
the IBL chromosome using physical contigs to estimate the bin sizes. The dotted line corresponds to the average ratio on the whole 
chromosome arm. Values are expressed in cM/Mb. 



similar values, the sizes of the contiguous 1BL6-0.32-0.47 
and lBLl-0.47-0.61 deletion bins were respectively overes- 
timated (49 %) and underestimated (60 %) by the cytoge- 
netic estimation. Interestingly, the cumulated sizes of the 
two bins were similar in the two estimates (only 6% of dif- 
ference), suggesting an error in the measure of the limit 



between the two deletion bins by the cytogenetic analysis. 
A similar difference was observed with the contiguous 
1BL2-0.69-0.74 and 1BL8-0.74-0.85 deletion bins (16% dif- 
ference between the two estimations of the cumulative 
size of the two deletion bins). The largest discrepancy 
(34%) was observed for the most distal bin 1BL4-0.89-1.00. 
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6 

Name of the deletion bins 

Figure 2 Comparison of the size ofnine deletion bins along chromosome 1BL using cytogenetic and physical estimations. The 

cytogenetic and physical estimates are provided in blue and red, respectively. The size of each bin is provided on the Y axis in Mb. Bins are 
ordered from the left to the right along the centromere-telomere axis. 



Based on an overall estimated size of 535 Mb for chro- 
mosome IBL [12] and a genetic map of 124.6 cM, the 
average genetic to physical distance ratio is 0.23 cM/Mb. 
The ratio between the genetic and physical distances was 
calculated further for each deletion bin using size esti- 
mates obtained from the physical maps. For this analysis, 
the lBLll-0.23-0.32 and 1BL6-0.32-0.47 deletion bins 
were merged because it was impossible to identify unam- 
biguously the limit between them on the IBL genetic map 
(Figure IB). The pattern of the cM/Mb ratio along chro- 
mosome IBL revealed three main parts. The first one, 
representing 63% of the chromosome (from the centro- 
mere to the lBLl-0.47-0.61), had a ratio close to 0.05 cM/ 
Mb on average (Figure IC). In the second part, represent- 
ing 21% of the chromosome (1BL14-0.61-0.69 and 1BL2- 
0.69-0.74 deletion bins), the ratio increased to 0.20 cM/ 
Mb on average (Figure IC); whereasin the remaining 16% 
of chromosome lBL,corresponding to the telomeric part, 
the average ratio dramatically increased toabout 1 cM/Mb 



and up to 1.46 cM/Mb in the most telomeric 1BL4-0.89- 
1.00 deletion bin. 

Synteny-based approaches to establish a putative gene 
order along wheat chromosome IBL 

Using the information from marker anchoring in the nine 
deletion bins, we were able to propose a rough order 
along chromosome IBLfor 380 contigs including 986 
genes. However, within each bin, it was impossible to 
assess the relative order of the contigs except for those 84 
that were anchored on the genetic map. To progress 
further in ordering the contigs and to analyze the gene 
space distribution, we used the synteny between the genes 
located on the IBL physical contigs and the genome 
sequences of rice, B. distachyon, and sorghum.Among the 
986 genes assigned to the IBL deletion bins, 815 homolo- 
gous genes were identified by sequence alignments (see 
Materials and methods) in B. distachyon, 816 in rice and 
834 in sorghum (Figure 3A-C; Additional file l).The 815 
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Figure 3 Schematic representation of the syntenic relationships between wheat chromosome 1BL and the orthologous chromosomes 
in Brachypodium distachyon, rice and sorghum as well as the 1BL GenomeZipper. (A) B. distachyon, (B) rice and (C) sorghum 
chromosomes in which syntenic regions were identified are represented in grey on the right side of the circle. Large areas represent the 
syntenic regions identified with each species while individual colored lines identify the non-syntenic genes. The black lines represent a wheat 
specific genome rearrangement. Black circles correspond to the centromeres of wheat chromosome IBL, Bradi3, OslO and SbOl. (D) Comparison 
between the IBL virtual gene order based on the IBL physical map (left part of the circle) and on the GenomeZipper approach (right part of 
the circle). Each line provides a link between the positions of the same gene on the two virtual gene ordering. The nine deletion bins of wheat 
chromosome IBL are represented in colors on the left side of the circle:C-l BLl 1-0.23 deletion bin in purple, the 1 BLl 1-0.23-0.32 in pink, the 
1 BL6-0.32-0.47 in dark blue, the 1 BLl -0.47-0.61 in blue, the 1 BLl 4-0.61 -0.69 in dark green, the 1 BL2-0.69-0.74 in green, the 1 BL8-0.74-0.85 in 
yellow, the 1 BL3-0.85-0.89 in orange and the 1 BL4-0.89-1 .00 in red. 



homologous Brachypodium genes were found on the five 
Brachypodium chromosomes with a majority (74.5%) on 
chromosome Bradi2 (354 genes) and BradiS (253 genes) 
(Figure 3A). Out of the 354 genes on Bradi2 and 236 
genes on Bradi3, respectively 236 (66.7%) and 128 (50.6%) 
formed unique coUinear blocks defining syntenic regions 
on these chromosomes (Figure 3A). In rice, 206 and 103 



of the 816 homologous genes were found in single coUi- 
near blocks on rice chromosomes 5 and 10, respectively 
(Figure 3B). In sorghum, 329 of the 834 homologous 
genes were found in three coUinear blocks including two 
on chromosome SbOl (65 and 44 genes) and one on chro- 
mosome Sb09 (220 genes) (Figure 3C). Interestingly, the 
disruption of coUinearity observed on chromosome SbOl 
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corresponds to the centromeric region. When compared 
to the position and size of the centromeric regions of 
chromosome IBL, BradiS and OslO (Figure 3), the results 
suggest a shift of at least 13 Mb of the SbOl centromere 
and an expansion of the centromeric regionthat is likely 
due to the accumulation of LTR-retrotransposons, as 
observed previously for all sorghum chromosomes [3]. 
Thus, these results show a clear breakpoint in the synteny 
between the genes located in the first three deletion bins 
of chromosome IBL and those in the remaining six dele- 
tion bins. The first region corresponds to chromosome 
Bradi3, OslO and Sb01,andthe second corresponds to 
chromosomes Bradi2, Os5 and Sb9. 

In total, 399 of the 1,161 (40.5%) IBL genes corre- 
sponding to 181 contigs were syntenic with rice, sorghum 
or B. distachyon;wher:e2iS, 587 (59.5%) genes correspond- 
ing to 234 contigs were non-syntenic (Table 1; Additional 
file 1). Among the non-syntenic genes, there was no sig- 
nificant bias towards any of the other chromosomes in 
the three grass species (Figure 3A-C; Additional file 1). 

A putative order of physical contigs within the deletion 
bins was established for the contigs carrying syntenic genes 
based on the order of their ortholog in at least one of 
the three grass species (Figure 3D). This provided an order 
for 180 contigs representing 48% of chromosome IBL 
(257 Mb) (Additional file 2). For 10 contigs (15 syntenic 
genes), the deletion bin information wasinconsistent with 
the synteny, indicating putative rearrangements. Seven of 
the contigs were anchored with a single gene and therefor- 
ethere was insufficient information to conclude potential 
rearrangements between wheat and the other genomes. 
The three remaining contigs (4.6 Mb of total size) were 
anchored in deletion bin lBLll-0.23-0.32 and corre- 
sponded to single blocks in rice (Osl0g0573800 to 
Osl0g0576000), B. distachyon (Bradi3g34090 to Bra- 
di3g34400) and sorghum (Sb01g28100 to Sb01g28320) that 
are syntenic with deletion bin 1BL6-0.32-0.47 (Figure 3A-C). 
This result indicates an intra-chromosomal translocation of 
a regionof at least 4.6 Mb in the wheat lineage. 

In a second step,we established a putative order along 
the wheat chromosome IBL for the genes located in the 
180 ordered contigs usingthe relative position of the BACs 
within the contigs. This resulted in ordering 787 genes 
(68% of the IBL genes set) along the IBL physical map 
(Additional file l).To check this putative gene order, we 
compared it to a gene order obtained by the GenomeZip- 
per [36,37] approach, in which a virtualgene orderis built 
using synteny information and genetic mapping. A zipper 
of the wheat chromosome IBL was performed by integrat- 
ing information from a wheat IBL genetic map comprising 
242 gene-based single nucleotide polymorphism markers 
obtained by genotyping by sequencingof the International 
Triticeae Mapping Initiative (ITMI) population[38], 
198,968 sequence contigs from the Illumina sequencing of 



sorted wheat chromosome IBL performed by the IWGSC 
[10], wheat ESTs from the HarvEST database [39], barley 
full-length cDNA, and the rice, sorghum and/or B. dis- 
tachyon genes (Additional file 3). The IBL zipper provided 
a virtual order forl,593wheat loci including 1,433 genes 
organized in eightsyntenic blocks containing between 29 
and 506 genes (average = 179) and 2.6 markers per block 
(min = 1, max = 8) (Figure 3D;Additional file 3). In total, 
429 genes with an average of 51 genes (8 to 132) per syn- 
tenic block were shared between the physical map-based 
and zipper-based virtual gene orders. Of these genes, 
354 (82.5%) were found in the exact same order in the two 
maps. However, the relative orientation of blocks BSl, 2, 
3, 5 and 7in the zipper was completely invertedcom- 
pared to the bin order(Figure 3D). A closer look at the 
genetic mapping data of the IBL zipper revealed that in 
these syntenic blocks, the numberof markers and 
recombination events was very low (Additional file 3), 
thus leading to unreliable orientations of the blocks. In 
the case of BSl and BS5, the information of the physical 
map clearly demonstrates that the genetic map is erro- 
neous (Figure 3D). These results suggest that the resolu- 
tion and accuracy of the gene-based wheat genetic maps 
remains a limiting factorfor reliable ordering of wheat 
physical maps using synteny with the other grass 
genomes. 

With about half of the BAG contigs and 68% of the 
genes ordered, the physical map of chromosome IBL 
provides a unique resource for detailed analyses of the 
gene space, accelerated map-based cloning,and future 
chromosome sequencing. 

Gene space organization and evolution of wheat 
chromosome IBL 

The 986 genes assigned to one of the nine wheat chromo- 
some IBL deletion bins allowed us to calculate the gene 
density per deletion bin using the cumulated length of 
anchored contigs in each deletion bin (Table 1). The 
results show that the gene density distribution correlates 
with the distance from the centromere (Pearson s correla- 
tion coefficient r = 0.956, P = 5.10"^), demonstrating the 
presence of a gradient of gene density with a two-fold 
increase from the centromere (1.8 genes/Mb in G-lBLll- 
0.23) to the telomere (3.77 genes/Mb in 1BL4-0.89-1.00; 
Figure 4). We then examined the correlation between the 
distribution of gene density and the density of syntenic 
and non-syntenic genes per deletion bin (Figure 4). This 
revealed that the gradient of gene density along chromo- 
some IBL is mainly due to the presence of non-syntenic 
genes (Pearson's correlation coefficient r = 0.870, P = 
0.0022) whereas the distribution of syntenic genes has no 
impact on the overall gradient (Pearson s correlation coef- 
ficient r = 0.370, P = 0.326; Figure 4). Thus, these results 
indicate a gradient of gene density from the centromere to 
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Figure 4 Gene distribution along wheat chromosome 1BL. The gene density in each of the nine deletion bins is indicated in gene/Mb on 
each side of the schematic representation of the chromosome. The left end side presents the relative percentage of syntenic and non-syntenic 
genes per bin whereas the percentage of genes found in islands versus isolated is indicated for each bin on the right end side. Bins are 
represented with the following color code:C-l BLl 1-0.23 deletion bin in purple, the 1 BLl 1-0.23-0.32 in pink,l BL6-0.32-0.47 in blue, 1 BLl -0.47-0.61 
in light blue, 1 BLl 4-0.61 -0.69 in green, 1 BL2-0.69-0.74 in light green, 1 BL8-0.74-0.85 in yellow,l BL3-0.85-0.89 in orange and 1 BL4-0.89-1 .00 in red. 
In wheat, chromosomes of group 1 originate from the fusion between the ancestral proto-chromosomes A5 and AlO [40]. The parts of 
chromosome IBL that originated from proto-chromosome AlO and A5 are marked by dark grey and light grey boxes, respectively. 



the telomere of chromosome IBL that is correlated with 
the proportion of non-syntenic genes. 

To further investigate the gene space organization, we 
studied the proportion and distribution of gene islands 
along the chromosome arm. Gene islands were defined as 
regions in which genes are located on the same or overlap- 
ping BACs. A total of 683 genes (69%) were identified as 
genes in islands and formed 284 gene islands composed of 
two to eight genes (average = 2.9 ±1.2, median = 2). Such 
a proportion of genes in islands is higher than the percen- 
tage expected from a random distribution as revealed by 
10,000 random samplings without replacement of 1,161 
genes on chromosome IBL BACs (average = 54% ±1.6%, 
P = lO'^^by test). The remaining 304 genes (31%) were 
considered as isolated genes whose density was shown to 
not be correlated with the distribution of total gene den- 
sity (Pearson s correlation coefficient r = 0.182, P = 0.638). 
The proportion of genes in islands varied among deletion 
bins (ranging from 54% in 1BL6-0.32-0.47 to 81% in 



1BL14-0.61-0.69) and the density of genes in islands was- 
correlated positively with the distribution of the total gene 
density (Pearson's correlation coefficient r = 0.962, P = 
0.000033; Figure 4). Moreover, this density wasalso corre- 
lated with the density of non-syntenic genes (Pearson's 
correlation coefficient r = 0.916, P = 0.00051) whereas no 
correlation wasfound with the density of syntenic genes 
(Pearson's correlation coefficient r = 0.208, P = 0.590). 
These results show that the gradient of gene density 
observed along chromosome IBL results from an increase 
of genes in islands along the chromosome axis from cen- 
tromere to telomere. 

As mentioned above, the general pattern of the distri- 
bution of syntenic and non-syntenic genes along the 
chromosome showed an increase in the proportion of 
non-syntenic genes from the centromere to the telo- 
mere. However, an additional peak of synteny was 
observed in the proximal bin 1BL6-0. 32-0-47 (65%; 
Figure 4). 
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In wheat, chromosomes of group 1 and group 3 origi- 
nate from an ancestral duplication of one of the five 
proto-chromosomes (A5) identified by Salse et aL[40] 
during paleogenomics studies of the grass genomes. 
Here, we wanted to investigate how many of the 1,161 
genes found on chromosome IBL are still conserved on 
chromosome SBLand originate from the ancestral proto- 
chromosome A5. Toidentify these genes, we usedthe 
ancestral grass gene set defined by Murat et aL[^l\, that 
is, genes conserved among all grass genomes. We identi- 
fiedl28 rice genes corresponding to 64 genes duplicated 
between rice chromosomes 1 and 5 that also derive 
fromthe ancestral proto-chromosome A5[41]. Their cod- 
ing regions were alignedbytblastx analysis against the 
Illumina contigs from the IWGSC survey sequencing of 
all wheat chromosome arms. All wheat genes matching 
an ancestral rice gene with 35% of identity at the amino 
acid level on 70% of the length of the rice genes were 
considered as putative homologs to the ancestral gene. 
Nineteen of the 128 rice genes matched more than 12 
wheat chromosome arms andwere eliminated from the 
analysis as they were considered belonging to large multi- 
gene families. We found 18 homologs to ancestral dupli- 
cated gene pairs on rice chromosomes 1 and 5 located on 
wheat chromosome lAL, 14 on IBL and 10 on IDL, cor- 
responding to 21 non-redundant genes in chromosome 
group IL (Table 2). In chromosome group 3L, 12 homo- 
logs were found on wheat chromosome 3AL and nineon 
3DL. For chromosome 3B,the survey sequences corre- 
spondedto theentire chromosome (and not to the two 
arms separately), and 38 homologs were identified 
including 12that were homologous to the same rice chro- 
mosome 1 and chromosome 5 ancestral duplicated pairs 
as the IBL genes. These 12 genes were considered to 
belocatedon the long arm of wheat chromosome 3B 
(Table 2). This was confirmed by ananalysis of the chro- 
mosome 3B reference sequence recently established by 
our laboratory (unpubUshed data).All together, 19 non- 



redundant genes were found on chromosome group 3L. 
Using these results, we identified ninegenes from the 
ancestral duplications between wheat chromosome lAL 
and the 3AL, 12 between IBL and 3BL, and sixbetween 
IDL and 3DL, leading to 16 non-redundant genes from 
the ancestral dupUcation between group IL and 3L. The 
1,161 gene set anchored to the IBL physical map con- 
tained eightof the 12 ancestrally duplicated genes identi- 
fied between wheat chromosome IBL and 3BL (data not 
shown), demonstrating that there are a few genes from 
the ancestral grass genome duplication that can still be 
detected.Interestingly, the 64 genes from the ancestral 
proto-chromosomeAS that are still conserved between 
rice chromosomes 1 and 5 representl.2% and 2.1%of the 
total gene content for the two chromosomes, respectively 
(5,078rice chromosome 1 and3,118 rice chromosome 5 
non-TE-related genes). Bycontrast,in wheat, these con- 
served ancestral genes represent onlyO.32% of chromo- 
some IBL and 0.25% ofchromosome 3BL genes, based on 
estimations of3,700 genes [28] on IBL and4,700 on 3BL 
[18], thereby suggesting a higher level of rearrangement 
during the evolution of the wheat genome than for rice. 

A physical map with more than 5,000 markers to support 
efficient map-based cloning on chromosome IBL 

The high quality of the IBL physical map (616 contigs 
covering 93% of the chromosome arm), the high number 
of markers (5,489 including 1,161 genes) anchored to the 
physical map and the good percentage of ordered contigs 
(48% of the chromosome arm) provide a robust platform 
for supporting map-based cloning. Currently,40 quantita- 
tive trait loci (QTLs) involved in various agronomically 
important traitssuch as resistance to biotic stress, nitro- 
gen use efficiency or bread-making quality have been 
mappedon wheat chromosome IBL (as of July 2012) [42]. 
The average confidence interval of these QTLs is6.68 cM. 
Based on a ratio of 0.06 cM/Mb in 70% of the centro- 
meric part of the IBL genetic map and 0.6 cM/Mb in the 



Table 2 Number of wheat genes originating from the grass ancestral duplication identified on wheat chromosomes 
group 1 and 3 

Wheat chromosome or chromosome arm Number of wheat genes originating from the grass ancestral duplication^ 



lAL 


18 


IBL 


14 


IDL 


10 


Group IL 


21 


SAL 


12 


3BL 


12 


3DL 


9 


Group 3L 


19 


Other long arms of wheat chromosomes 


3.4 ±2.5 


Other groups L 


6.4 ±3.5 



^ Chromosomal distribution of the 64 ancestral duplicated genes pairs from rice chromosomes 1 and 5 matching the Illumina contigs from the IWGSC survey 
sequencing of all wheat chromosomes. 
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remaining part, the average confidence interval of these 
QTLs is about 83 Mb. Thus, with the current marker 
density of the IBL physical map, each QTL interval con- 
tains potentially 916 markers including 193 genes. The 
potential of the IBL map for positionalcloningcan be illu- 
strated with a representative example from the literature. 
For a QTL found for bread volume (Bvol[43])that was 
mapped on chromosome IBL in a confidence interval of 
11.8 cM flanked by markers wmcl56 and gwm403, our 
integrated physical map provides 50 ordered physical 
contigs, containing 1,066 markers including 248 genes, 
and 105 unordered contigs, containing 1,113 markers 
including 219 genes for further fine mapping and posi- 
tional cloning. With such information tohand, it is possi- 
ble to define a subset of markers ordered along the 
interval and initiate fine mapping in a high resolution 
recombinant populationto rapidly identify physical con- 
tigs that can provide additional marker and gene candi- 
dates or be sequenced. When the QTL is refined to 
smaller intervals and locatedin regions with a ratio of 
0.6 cM/Mb, as is the case for nine QTLs currently 
mapped within an estimated confidence interval of less 
than 3 cM on chromosome IBL, the potential of the IBL 
physical map becomes even higher. Indeed, with an N50 
value for the contig size of 1.1 Mb, oneto three contigs 
should be sufficient to cover such intervals, thereby pro- 
viding landing pads for gene isolation. 

Discussion 

Challenges and new opportunities for constructing and 
anchoring physical maps in wheat 

Because ofits size (17 Gb) and complexity (polyploid (2n = 
6x =42) with a large amount of repetitive sequences 
(>80%)) [18], the wheat genome has been viewed as 
'impossible' to physically map and sequence. Recently, the 
construction of a physical map of the largest wheat chro- 
mosome (3B) demonstrated the feasibility of physical map- 
ping in hexaploid wheat using a chromosome-based 
approach [14]. Here, using the same tools (SNaPshot, 
FPC) andassembly methodology, we built a first automated 
assembly of the IBL physical map. It resulted in a coverage 
of 151% of the chromosome arm, which suggeststhat a lot 
of small contigs covered the same regions as larger con- 
tigs.Such a result can be explained by fingerprints of low 
quality. In fact, atechnical problem occurred on the auto- 
mated sequencer during fingerprinting leading to low 
reproducibility. As initial assemblies with FPC are per- 
formed at high stringency (cut-off of le'^^) andBAC fin- 
gerprints are merged into contigs only if they share more 
than 70% of the bands [14], low quality fingerprints result 
in BAG clones from the same regions that are not merged, 
thereby increasing the number of contigs for a given 
region. Fingerprinting in wheat is not trivial. A number of 
critical steps (bacterial growth conditions, restriction 



enzymes quality, running conditions) for producing high 
quality fingerprints have been identified by key labora- 
tories involved in wheat fingerprinting [26]. One of the 
important factors is to perform the fingerprints in a 
comparable manner throughout the duration of the 
experiment (that is, several months)and using the same 
sequencer is strongly recommended.Decreasing the initial 
stringency of the FPG assembly to compensate for lower 
fingerprint quality is not an optionin wheat because the 
high percentage of repetitive sequences leads to a large 
numberof chimerical contigs [14]. Dealing with such pro- 
blems and improving assembly in wheat was one of the 
rationales behind the development of theLTG software 
[22]. Here, we demonstrated with a real case study that 
LTG improves assembly even with a suboptimal quality of 
fingerprints. Assembly with LTG resulted in a robust IBL 
physical map covering 94% of the chromosome arm and 
with an N50 of 1,128 kb, that is,about threetimes longer 
than with FPG andgenerally higher than the values 
achieved so far with FPG in wheat and barley [14,44]. 

Whole Genome Profiling (WGP^m) that relies on 
sequence-based fingerprinting of BAGs to buildphysical 
mapshas been developed recently [45] and was evaluated 
for wheat using a subset of the wheat 3B chromosome 
BAG library [46]. The results showed that, with an 
improved assembly methodology, the WGP-based physical 
map contained 30% fewer contigs than the SNaPshot 
physical map with an equivalent coverage of the target 
regions, and that the WGP contigs contained 3.5 times 
less mis-assembled BAGs. Moreover, further improve- 
ments of the methods by using EcoRl instead of Hindlll 
for the restriction digest and increasing the length of the 
sequence tags using longer reads were proposed [46]. 
Thus, based on the experience with the physical map of 
chromosome IBL and the WGP pilot study on chromo- 
some 3B,we recommend combining the improved WGP 
fingerprinting methodology with the LTG assembly soft- 
ware for future physical mapping efforts in wheat. 

To achieve the full benefit of a physical map, BAG con- 
tigs need to be anchored and ordered to genetic and/or 
cytogenetic maps with molecular markers. PGR-based 
methods coupled with pooling strategies arecost effective 
to perform such anchoring. Here, we used the same pool- 
ing scheme (three-dimensional pools) as forthe 3B physical 
map [14], but with an improved method using a new 
Perl script called DSMP.pl to automate the deconvolution 
of the three-dimensional MTP pool screening results. This 
enabled us to deconvolute the information for89.5% 
of markers. Moreover, the hybridization of the three- 
dimensional IBL MTP pools on gene-based (40k) and 
ISBP-based (17k) arrays allowed us to place more than 
5,000 markers on the physical map of chromosome IBL. 
This represents a marker density of 11 markers per Mb, 
the highest density of markers obtained to date for a 
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wheat physical map. The marker density on the IBL physi- 
cal map is about 10 times higher than the first version of 
the 3B physical map (1.4 marker/Mb) [14] and the number 
of contigs anchored and ordered represents 74% of chro- 
mosome IBL, compared with only 56% of chromosome 
3B. This marker density is equivalent to the density 
obtained for the maize physical map (12 markers per Mb) 
[24] and is even higher than the density obtained forthe 
rice physical map (8 markers per Mb) [23] and the recent 
barley physical map (0.65 markers per Mb) if we exclude 
the markers placed in silico[44]. 

One of the recurrent difficulties in wheat physical map- 
ping is the lack of precision in orderingmarkers along the 
chromosomes. Until recently, ordering in wheat was per- 
formed using deletion bins ranging from 20 to 125 Mbin 
size [47,48], thereby making it nearly impossible to assess 
the order of contigs or markers within bins. In addition, 
estimates of the deletion bin sizes thathave been used to 
calculate physical to genetic distances ratio (cM/Mb) in 
wheat are prone to errors as demonstrated in this study. 
Ourresults suggest inaccurate estimates of the bin sizes 
by cytogenetic measurements, with increased errors at 
the telomeric ends of the chromosome thatarelikely due 
to higher decondensation of the DNA in these regions. 
Cytogenetic measurements of the deletion bin sizes were 
initially performed on photographs with rulers, which 
can largely explain the inaccuracy [35]. Alternatively, it is 
possible that the size of some deletion bins estimated by 
the physical map is underestimated due to under-repre- 
sentation of the corresponding region in the BAG library. 
Thus, these results confirm that, in wheat, the recombi- 
nation rate is very low in 70% to 80% of the chromosome 
and is multiplied by 10 to 20 in the remaining 20% to 
30% with a steady increase towards the telomeres. They 
suggest also that the construction of physical maps pro- 
vides a reliable substitute to deletion bins for performing 
accurate recombination studies, more particularly in the 
telomeric regions where the cytogenetic estimation 
seems erroneous. 

Ordering physical contigs can be supported to some 
extent by genetic mapping. Here, we obtained a genetic 
map of 124.6 cM, which is in the range of the length 
reported for IBL in the literature [48-50]. However, 
recombination is not evenly distributed along the wheat 
chromosomes and in about two-thirds of the chromosome 
length, recombination is severely reduced or absent [25]. 
This makes it impossible to order about two-thirds of the 
wheat physical maps, especially when using bi-parental 
genetic maps with small size populations [14,25,48,51,52]. 
To overcome this problem, we developed a strategy that 
combinesgenetic mapping with deletion bin assignation 
and synteny with rice, B. distachyon and/or sorghum. The 
use of synteny in grasses to order BAG contigs and genes 
is based on the fairly high coUinearity observed between 



the cereal genomes [53,54]. Here, we confirmed this high 
coUinearity by the good correlation between the deletion 
bin assignation of the IBL unigenes and the order of the 
orthologous genes in rice, sorghum and B, distachyon. 
One exception was observed for three contigs that likely 
reflects an intra-chromosomal translocation of a region of 
at least 4.6 Mb in the Triticeae lineage. The main limit of 
the synteny approach concerns the conservation at the 
micro-collinearity level. Several studies demonstrated a 
very good conservation of the micro-collinearity between 
cereals [55,56] but others showed some local rearrange- 
ments [57-59]. Here, 48 ordered BAG contigs contained 
more than three syntenic genes, for a total of 195 syntenic 
genes. Only 12 out of these 195 syntenic genes (6%) were 
not in the same order in wheat compared to the other 
grasses (data not shown). Because some of these differ- 
ences may be due to errors in the ordering of BAGs in the 
contigs by FPG, we estimate a maximum of 6% break in 
the micro-collinearity between wheat chromosome IBL 
and the orthologous regions in the others grass genomes. 

The comparison of our anchoring strategy with the 
GenomeZipper strategy [36, 37], which integratesgene- 
based genetic mapping information usingsynteny toother 
grass genomes, demonstrated thatthe difficulty in map- 
ping homoeologous genes unambiguously and the lack 
of recombination remain aserious limitation for ordering 
physical maps in hexaploid wheat. Two others strategies 
can be deployed to overcome the lack of recombination 
in wheat bi-parental genetic maps. The first one consists 
ofsimultaneously increasing the number of meiosis and, 
thus, the number of recombination and polymorphism 
by using more than two parents.Multi-parent advanced 
generation inter-cross (MAGIG) populations can be 
developed by intercrossing a number of parent lines (2"^) 
for n generations in a set mating designed to combine 
the genomes of all (2") parents in the progeny lines. In 
wheat, two MAGIG populations, with four and eight 
founders respectively,have been established [60]. Gompari- 
son of thefour-way MAGIG population with a consensus 
map incorporating over 100 bi-parental populations 
showed that several regions where markers could not be 
separated by recombination in the consensus map are 
spread out over 10 to 20 cM in the MAGIG map [60]. This 
increase in resolution within the four-way MAGIG popu- 
lation is expected to be even higher with a greater number 
of founders and therefore,the eight-way MAGIG popula- 
tion [60] represents an attractive solution for anchoring 
wheat physical maps in the future. Another possibility to 
increase recombination is to use Unkage disequilibrium 
(LD) mapping. The extent of LD,or non-random associa- 
tion of alleles at linked loci, depends on the recombination 
rate, the mating system, the domestication process, the 
natural and artificial selection, and the population structure 
[61]. LD can persist over tens to hundreds of kilobases in 
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selfing species like Arabidopsis thaliana[62\ or soybean 
[63] and can decline over a few hundred base pairs in out- 
crossing species like maize [64] . In wheat, the LD extent 
was estimated between 300 and 500 kb (r^ = 0.2) depend- 
ing on the geographical origin of the population analyzed 
[65]. LD mapping is developing rapidly in wheat through 
the construction of association panels [66-68] and the expo- 
nential increase in the number of markers,thereby offering 
new perspectives for anchoring physical maps in wheat. 

The second strategy to overcome the lack of recombi- 
nation in wheat is radiation hybrid mapping,which is 
completely independent from recombination as it relies 
on radiation-induced chromosome breakage and the 
reconstruction of the markers order based on co-reten- 
tion analysis [69,70]. A pilot study during the construc- 
tion of the 3B physical map indicated a resolution of 
263 kb for the 3B radiation hybrid mapping panel [14] 
and, more recently, Kumar et [71] estimated a resolu- 
tion <140 kb for a panel of the wheat D-genome pro- 
genitor Aegilops tauschii. Thus, radiation hybrid 
mapping provides a viable solution to resolving the 
order of physical maps in the two-thirds of the wheat 
chromosomes that show little or no recombination. 

A few years ago, physical mapping in hexaploid wheat 
seemed out of reach. The success of the 3B and IBL 
physical mapping projects and the many more physical 
maps to come in the near future through the coordi- 
nated efforts in the IWGSC [10] with theconstant 
improvement in methodologies opens new perspectives 
in wheat research. Wheat physical maps will gradually 
replace the aneuploid stocks that were used in the past 
decades for defining the location of markers and traits 
along wheat chromosomes. In addition,high resolution 
wheat physical maps will help to increase our under- 
standing of factors underlying recombination [25], which 
in turn should enable the manipulation and improve- 
mentof recombination in pericentromeric regions. 

High density physical mapping provides new insights 
into wheat genome evolution 

It is currently recognized that the common ancestor of 
grasses had, approximately 90 million years ago, five 
chromosomes, and underwent a whole genome duplica- 
tion followed by inter-chromosomal translocations and 
fusions resulting in an intermediate ancestral genome 
(approximately 70 million years ago) with 12 chromo- 
somes (Al to A12) [29,56]. In wheat, chromosomes from 
groups 1 and 3 originate from chromosomes A5 and Al, 
respectively, which resulted from the whole duplication 
of proto-chromosome A5[40]. While ancestral wheat 
chromosome 3 evolved directly from chromosome Al 
without major rearrangements, ancestral wheat chromo- 
some Iresulted from the insertion of chromosome AlO 
into chromosome A5[40]. As a consequence, only the 



distal 53% of chromosome IBL share common ancestry 
with the long arm of chromosome 3B. In rice, which has 
retained the same 12 chromosomes structure as the 
intermediate ancestral genome, the chromosomes synte- 
nic to wheat chromosome 3 and 1 are chromosomes 1 
and 5, respectively. Murat et al, [41] identified 64 genes, 
still conserved in today's rice, Brachypodium,2ind sor- 
ghum genomesfrom the duplication of ancestral proto- 
chromosome A5.A similarity search between the 128 
ancestrally duplicated rice genes and the Illumina contigs 
from the IWGSC survey sequencing of all wheat chromo- 
somes identified 12 genes on wheat chromosomes IBL 
and 3BL. When comparing the proportion of ancestral 
genes conserved from proto-chromosome A5 in wheat 
and rice, there wasabout four to eight times more reten- 
tion in rice. This supports evidence that in wheat the 
ancestral gene backbone has been more rearranged than 
in rice [72,73] and indicates that wheat is not a good tem- 
plate for the reconstruction of the ancestral grass chro- 
mosome content. Among the IBL gene set, 59.5% were 
identified as non-syntenic geneswith the other grass gen- 
omes confirming the hypothesis of active gene move- 
ments specific to the wheat lineage after divergence from 
the other grasses [18,28,73,74]. This ratio is very similar 
tothe estimates of Wicker et al. [28], who found 62.7% of 
non-syntenic genes using 454 sequencing of sorted chro- 
mosome IBL. Here, access to the physical map enabled 
us to further analyze their distribution along the chromo- 
some. The general pattern showed an increase in the pro- 
portion of non-syntenic genes from the centromere to 
the telomere, as previously observed in wheat [18,73-77]. 
However, apeak of synteny was observed in the proximal 
bin 1BL6-0.32-0-47. Interestingly, this bin corresponds to 
the distal end of the ancestral orthologous rice chromo- 
some 10. The increase in the proportion of syntenic 
genes from the centromere to the telomeres has been 
observed systematically on the Br achy podium, sorghum 
and rice chromosomes [41], reflecting a general pattern 
for the distribution of syntenic genes along ancestral 
grass chromosomes. Therefore, the peak of synteny 
observed in the middle of chromosome IB likely corre- 
sponds to the ancient telomeric synteny pattern whereas 
the increase of non-syntenic genes towards the telomere 
reflects the more recent history of the wheat genome 
evolution. Thus, we conclude that the synteny distribu- 
tion along chromosome IBL is the result of the superim- 
position of the ancestral grass and the recent wheat 
evolutionary patterns. Such apattern was not observed as 
clearly in a recent comparative study of Ae. tauschii, the 
diploid ancestor of the D genome of bread wheat [73] . In 
that study, the authors showed that the average synteny 
with both rice and sorghum chromosomes was signifi- 
cantly higher in the proximal half than in the distal half 
of the Ae. tauschii chromosome segments, reflecting the 
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increase of the proportion of non-syntenic genes from 
the centromere to the telomere as seen on chromosome 
IBL. However, they did not observe a significant differ- 
ence in the synteny level between the proximal and the 
distal halves of chromosome IDL that corresponds to the 
ancestral orthologous rice chromosome 10. This is likely 
due to the lower number of gene loci (21 versus 1,161) 
analyzed on chromosome IDL compared to chromosome 
IBL, illustrating the potential of high density physical 
mapping in comparative studies. 

On chromosome 3B, detailed sequence analysis of 13 
large contigs containing 175 genes distributed along the 
chromosome [18] and 2,924 genes anchored on the SB 
physical map [15] indicated that 35% to 42% of the genes 
are not syntenic with other grass genomes. These results 
suggest a lower proportion of non-syntenic genes on chro- 
mosome 3B compared to chromosome IBL (approxi- 
mately 60%). Moreover, no peak of synteny was observed 
on wheat chromosomes 3BL and 3BS,in which there is a 
continuous increase in the proportion of non-syntenic 
genes along the chromosome arms towards the telomeres 
[15]. These differences aredue to the different evolutionary 
origin of homoeologous group 3 chromosomesin wheat, 
which, in contrast to chromosomes of group 1, originate 
directly from a single ancestral chromosome (proto-chro- 
mosomeAl) [78]. Despite these differences, the density and 
proportion of gene islands,the gradient of gene density 
from the centromere to the telomere,and thecorrelation 
with the density of non-syntenic genes or the recombina- 
tion rate remained very similar between chromosome IBL 
andchromosome 3B [15]. Thus, our results show that 
wheat chromosomes display a superimposition of evolu- 
tionary patterns. Some, such as the pattern of synteny, will 
differ between two chromosomes as a reflection of ancient 
history while others, such as the pattern of gene island for- 
mation and non-syntenic gene movement, will be similar 
because they reflect wheat lineage-specific and more 
recent history. 

The IBL physical map: a landing padfor efficientmap- 
based cloning and sequencing 

The size of the bread wheat genome (17 Gb) is a consider- 
able challenge for map-based cloning. In particular, the 
large amount of repetitive DNA (>80%) and the presence 
of three homoeologous genomes (A, B and D genomes) 
represent major difficulties during chromosome walking. 
Consequently, only a few genes have been cloned in wheat 
so far [79]. In the absence of physical maps, comparative 
genomics has been used to supportmap-based cloning in 
wheat. The positional cloning of VRN genes on chromo- 
somes 5 A and 7BS [80-82], Phi on chromosome 5B [83], 
and Gpc-Bl on chromosome 6BS [84] in wheat benefited 
greatly from the available rice genome sequence. However, 
comparative analyses of wheat LriO[85], Lr21[86] and 



Pm3b[S7] disease-resistance genes with the rice genome 
sequence showed that it contains homologous genes to 
these three genes, but at non-orthologous positions, indi- 
cating that genomic rearrangement interruptscoUinearity 
of wheat and rice in some genomic regions. We confirmed 
these interruptions between wheat and the other grass 
genomes for chromosome IBL with a break of collinearity 
between deletion bins 1BL6-0.32-0.47 and lBLl-0.47-0.61, 
and the high level of non-syntenic genes (59.5%). These 
three disease-resistance genes were isolated by positional 
cloning using diploid or tetraploid wheat genomes that are 
closely related to the genomes of modern hexaploid wheat 
as an alternative to the use of the rice genome sequence. 
However, this approach is very time-consuming and 
requires a variety of genomic resources. The low number 
of genes cloned in the wheat genome and more particu- 
larly on chromosome IBL, for which none of the 40 QTLs 
mapped so far have yet been cloned, illustrates the current 
limits in positional cloning in wheat. 

Here, we provide a powerful tool for map-based cloning 
on wheat chromosome IBL with a high quality (93% of 
chromosome coverage, N50 = 1,128 kb) and very dense 
(11 markers per Mb) physical map including 1,161 genes, 
a good percentage of ordered contigs (48% of the chromo- 
some arm) and a high level of anchoring (74% in the dele- 
tion bins and 19% in the genetic map). This provides 
potentially 916 markers including 193 genes to each of the 
40 QTLs mapped on chromosome IBL. With such infor- 
mation tohand, it is possible to define a subset of markers 
ordered along the interval and initiate fine mapping in a 
high resolution recombinant population to rapidly identify 
physical contigs that can provide additional markers and 
candidate genes or be sequenced. For nine QTLs mapped 
in the distal part of the chromosome arm (ratio of genetic 
to physical distance = 0.6 cM/Mb) with a confidence inter- 
val of less than 3 cM and an N50 value for the contig size 
of 1.1 Mb, one to three contigs should be sufficient to 
cover such intervals,providing landing pads for a rapid 
identification of potential candidate genes. 

The physical map of wheat chromosome IBL provided 
here was built with a BAG library constructed from the 
genotype Ghinese Spring. However, in the final step of 
map-based cloning, it is often desirable to use a genomic 
library of the cultivar that contains the gene of interest. 
This problem was revealed in wheat by the analysis of the 
LrlO gene locus. At this locus, two haplotypes thatwere 
defined by the presence (HI) or absence (H2) of two resis- 
tance-gene analogs were found in a collection of 113 wild 
and cultivated wheat lines [88]. The isolation of LrlO was 
only possible because the BAG library used in the sub-gen- 
ome map-based cloning approach was constructed from a 
genotype belonging to the HI haplotype [85]. Thus, in 
some cases there will be a need to construct a new BAG 
library in another genotype than Ghinese Spring. Here, the 
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high density of markers anchored to the IBLphysical map 
(11 markers per Mb)combined with an adapted pooUng 
strategy of the new BAG library [89] should allow the 
rapid identification of BAG clones spanning the target 
region and chromosome landing. 

The IWGSG has established a road map for obtaining a 
high quality reference genome sequence [90] ofthe hexa- 
ploid bread wheat genome following a chromosome- 
based strategy [11] to overcome the difficulties associated 
with the high level of ploidy. Only a high quality assem- 
bly allows completegenome information to be captured 
accurately, in particular the information embedded in the 
repetitive fraction, which has been shown to play key 
roles in evolutionary changes and regulatory innovation. 
This is of crucial concern for the wheat genome as it is 
primarily composed of repetitive elements. In this regard, 
the high quality physical map of wheat chromosome IBL 
provides a robust platform for sequencing the IBL chro- 
mosome using a BAG-by-BAGapproach and adapted 
next-generation sequencing technologies. 

Conclusions 

Using a combination of efficient assembly tools and high 
throughput genotyping platforms, we developed a high 
quality physical map representing 94% of wheat chromo- 
some IBL. The map is anchored (74%) and ordered (48%) 
with 5,489 markers, representing the highest density of 
markers (11 markers per Mb) so far for a wheat physical 
map. This provides a powerful tool for map-based cloning 
and a robust platform for sequencing the IBL chromo- 
some in a BAG-by-BAG approach. The high density of 
genes mapped on the IBL physical map allowed us to gain 
new insights into the gene space organization. Further, it 
revealedthat the pattern of synteny along chromosome 
IBL is the result of the superimposition of the ancestral 
grass and recent wheat evolutionary patterns. 

Materials and methods 

BAC Fingerprinting and data processing 

A chromosome IBL-specific BAG library named 
TaaGsplBLhAcontaining 92,160 clones originating from 
sorted IBL wheat chromosome of Ghinese Spring was con- 
structed as described by Simkova et al [91]. Fingerprinting 
of all IBL BAG clones was performed as described in Paux 
et al, [14]. Briefly, it consisted in the digestion of BAG 
DNA by five restriction enzymes {BamHl, EcoRl, Xbal, 
Xhol and Haelll); labeling of the DNA fragments with the 
SNaPshot™ Multiplex Labeling Kit solution (Applied Bio- 
systems, Foster Gity, GA, USA); and estimation offragment 
sizes on an ABI 3730XL DNA capillary sequencer (Applied 
Biosystems). 

Data were processed using the GeneMapper, Finger- 
print Background removal (FPB) and GenoProfiler pro- 
grams to size the fragments and remove background 



noise and contaminations. In particular, raw electro- 
pherograms produced by the ABI Data GoUection soft- 
ware were analyzed using GeneMapper: fragment sizing 
was performed without the 250base pair (bp) band of 
the 500LIZ GeneScan size standard file. Peak areas, 
peak heights and fragment sizes of each BAG fingerprint 
profile were exported in text format. Spurious peaks 
(background noise, vector bands, partial or unspecific 
digestions) and bands out of the range of 50 to 500 bp 
were removed by FPB; this software was also used to 
discard substandard profiles that may negatively affect 
contig assembly and to convert data into a format com- 
patible with the GenoProfiler and FPG programs. Geno- 
Profiler was used to detect cross-contaminated clones in 
384-well and 96-well plates and to remove negative 
controls. 

At the end, a total of 65,413 highquality fingerprints 
(71%) were obtained with an average number of scored 
bands per BAG fingerprint of 107 ±25 (ranging from 30 to 
216). With an average insert size of 129 ±29 kb, the total 
size coverage of the fingerprinted BAGs was 8,424 Mb. 

BAC assembly 

Two software packages were used to build the physical 
map using the 65,413 high quality fingerprints: FPG and 
LTG. With the FPG software, automated assemblies were 
performed using the methodology described by Paux et al 
[14] for the construction of the physical map of chromo- 
some 3B. Briefly, the initial build of chromosome IBL was 
performed by incremental contig building with a cut-off of 
le'^^ These were subsequently run through single-to-end 
merging (Match: 1) at six successively higher cut-offs end- 
ing at le'^^. The DQer function was used at each cut-off 
to break up all contigs that contained more than 10% of 
Questionable (Q) clones (Step: 3). The following para- 
meters were used to establish the FPG physical map: a tol- 
erance of 12, a gel length of 56,000 and a From End value 
of 55. 

With the LTG software, the automated assemblies were 
performed using the following methodology: the same 
metric used with FPG, called the Sulston score, was used 
to calculate clone overlaps. Numerous groups of overlap- 
ping clones, called net of significant overlaps, were 
obtained with a cut-off of le'^^. Then, subnetswere 
obtained at a cut-off of le'^^ and used to build contigs. All 
contigs with 5 to 999 clones and with linear topology (net 
width <1) were validated. All contigs with a width >2 were 
checked manually for their linearity. If only one clone 
explained the non-linearity, the contigs were validated 
because this non-linearity was likely due to the bad quality 
of the fingerprint for this clone. Then, all contigs with 
more than 999 clones and all non-linear contigs were bro- 
ken up by the elimination of Q clones and Q overlaps. 
The same featuresas described previously (linearity and 
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number of clones) were used to validate contigs. The 
remaining contigs were broken up by increasing the strin- 
gency (cut-off of le'^^) and then elimination of Q clones 
and Q overlaps for the non-validated contigs. The follow- 
ing parameters were used to establish the LTC physical 
map: a tolerance of 12, a gel length of 60,000,and a mini- 
mum contig size of five clones.A manual version of the 
LTC physical map constructionwas then performed by 
identifying fingerprint overlaps with a lower stringency 
(cut-off of le'^^), supported by information provided by 
contig anchoring in deletion bins with molecular markers 
as described in Paux et al, [14]. A home-made Perl script 
called FingMergingContigs.pl was developed to automate 
the identification of the contigs that need to be manually 
merged.LTC does not provide a consensus band map, 
which is needed to assign coordinates to the clones based 
on their alignment to the map [21] and allow theeasy 
ordering of markers inside physical contigs. Thus,to pro- 
vide a gene order on the LTC physical map, the 616 LTC 
contigs were rebuilt with the FPC software to produce a 
consensus band map for each LTC contig. 

Minimal tilling path design and 3-dimentional pooling 

The MTP was selected from the FPC automated physical 
map using the FPC software. The following parameters 
were used: a minimum FPC overlap of 30, a maximum 
FPC overlap of 250, a From End of 0 and minimum shared 
bands of 12. The three-dimensional pooUng of the 8,597 
clones of the MTP selected from the IBL BAC library was 
performed as described in Paux et al [14]. Itresulted in 24 
column pools, 16 raw-pools, 23 plate pools, and a super 
pool containing all 8,597 BACs clones. 

Marker development 

DNA amplified from IBL sorted chromosomes was used 
for 454 shotgun sequencing as described by Wicker et al 
[28]. A total of 2,177,887 good quality reads were obtained 
with an average read size of 383 bp and a total size of 
834 Mb corresponding to a coverage of 1.6 X. Two home- 
made Perl scripts, IsbpFinder.pl and ssrFinder.pl, were 
used to develop ISBP and SSR markers, respectively, from 
the 454 reads. In total, 775,995 ISBPs and 38,400 
SSRswere designed. Other home-made Perl Scripts were 
developed to automatically discard duplicated markers 
and select the best quality markers. The quality criteria- 
were the level of confidence given by IsbpFinder.pl [29] 
for the ISBP markers and more than 10 repeats for dinu- 
cleotide repeats for the SSR markers. Finally, 46,194 ISBP 
and 412 SSR corresponding to a total of 46,606good qual- 
ity and non-redundant markers were identified. 

Plant material and DNA extraction 

The specificity of the markers for chromosomelBL and 
marker assignment by deletion bin mapping was 



performed with aneuploid lines of Chinese Spring corre- 
sponding to: a nullisomic IB-tetrasomic line, a ditelosomic 
IBL line, a ditelosomic IBS line [31,32], and eight deletion 
bin lines (lBLll-0.23, 1BL6-0.32, lBLl-0.47, 1BL14-0.61, 
1BL2-0.69, 1BL8-0.74, 1BL3-0.85 and 1BL4-0.89) [35]. 

The reference genetic mapping population was the same 
as the one described by Saintenac et al [25] and was 
derived from the cross between cultivars Chinese Spring 
and Renan. Fl plants were self-fertilized and approxi- 
mately 1,300 F2 seeds were sown to produce a single seed 
descent population. Among these lines, a set of 381 was 
used for genetic mapping. For each F2, 10F3 seeds were 
sown and leaves were harvested at a three-leaf stage for 
DNA extraction according the procedure described in 
Saintenac et al [25]. 

Genetic mapping 

A total of 84 molecular markers (48 SSRs and 36 ISBPs) 
previously assigned to chromosome arm IBL exhibited 
polymorphism between Chinese Spring and Renan and 
were selected for linkage analysis on the crossed popula- 
tion. The genetic map was constructed based on the 
maximum likelihood method using Mapmaker software 
[92] with a log of odds of 3 and 9 of 0.25, applying the 
Kosambi [93] mapping function to transform recombi- 
nation fractions into cM. 

The chromosome IBL neighbor map was constructed 
following the same strategy as for chromosome 3B [14]. 
Briefly, the Chinese Spring xRenan genetic map from 
chromosome IBL was used as a framework on which the 
position of loci mapped in another population was extra- 
polated. Loci shared between two maps were identified 
and used to define genetic intervals in which loci not pre- 
sent on the framework map (hereinafter referred to as 
'target loci') were listed. Then, distances between shared 
and target loci were calculated as a ratio of the distance 
of the genetic interval and used, ultimately, to estimate 
the coordinate of the target loci on the neighbor map. 
The neighbor map was constructed with segregating data 
from the following mapping populations: Chinese Spring x 
Renanas a framework; W7984 x Opata (the ITMI reference 
population; GrainGenes[27]); RL4452 x AC Domain, 
SC8021-V2 X AC Karma, Wuhan x Nyubai (three popula- 
tions from Agriculture Canada integrated in a consensus 
map including the ITMI map; Somers et al [50]; Banks 
et al [49]); Courtot x Chinese Spring[48]; and a wheat 
composite map originating from several tens of populations 
(R. Appels, personal communication). 

PGR amplification 

PCR amplification was carried out in 10 (iL reaction 
volume containing 5 (iL of AmpliTaq gold 360 master 
mix (Applied Biosystems), 0.4 (iL 360 GC enhancer 
(Applied Biosystems), 1 (iL syto9 (2 (iM), 1.6 (iL forward 
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and reverse primer (3 \iM) and 2\iL of 1/200 phi29 MTP 
pool amplification. PGR conditions were as follows: 
initial denaturation at 95°C for 10 min followed by 47 
cycles of denaturation at 95°C for 30 s, annealing at 
62°C for 30 s, extension at 72°C for 30 s, and a final 
extension at 72°C for 5 min. The PGR plates were then 
run on LightCycler 480 (Roche Diagnostics, Meylan, 
France!) to obtain dissociation curves. Results were then 
analyzed on LightCycler 480 Software release 1.5.0. 

NimbleGenarray development 

High confidence ISBP markers were selected from the 
46,194 ISBPsdesigned in the section 'markers develop- 
ment'. A mathematically defined repeats index using 
the Talymer program [94] and a Perl-based script were 
used to select specifically ISBP markers comprising a junc- 
tion between a TE and a stretch of 30mer of low-copy 
sequence. Selected were 17,788 ISBP markers with an aver- 
age of five probes per marker (88,470 probes), 27 positive 
controls (TE from wheat, 211 probes) and 20 negative 
controls (TE from several other species; 159 probes). 
Probes were 50 to 58 nucleotides in length (30mers for the 
low-copy DNA and a varying length of 20 to 28 nucleo- 
tides for the TE), with a temperature of melting (Tm) 
between 68°C and 73°C (Tm = 41 ((nG + mC) - 16.4)/ 
L + 64.9, where L = length of the oligonucleotide), and a 
GC content between 40% and 60%. All probes were 
oriented 5' LowCopyDNA_TE 3'(3' end being adjacent to 
the array surface). Selected probes were sent to design at 
Roche NimbleGen. All probes were randomly synthesized 
and spotted in a high density custom NimbleGen 12xl35k 
array(Roche NimbleGen, Inc.). 

Microarray hybridizations 

For the 63 IBL chromosome MTP BAG pools,500 ng of 
DNA was labeled using the NimbleGen Dual color label- 
ing kit (Roche NimbleGen Inc.) according to the manufac- 
turer s protocol. Dual color hybridizations were performed 
on each plex of the arrays. For the IBL sorted chromo- 
somes, 300 ng of DNA was labeled using the same kit. A 
dye swap was performed for this sample. 

An updated version [A-MEXP-2314] of thewheat Nim- 
bleGen 40k unigene microarray[15], was hybridized 
according to the manufacturer's protocol with an addi- 
tional vortexing step ofthe hybridization solution master 
mix (chapter 4 steps 1 to 4 of the protocol for hybridiza- 
tion and washing). Hybridization was performed for 
72 hours at 42°C instead of 24 hours as in the original pro- 
tocol. Washing steps were performed according to the 
manufacturer's protocol for gene expression analysis 
(Roche NimbleGen Inc.) 

Hybridization and washing of the NimbleGen 17k IBL 
ISBP arraywere performed according to manufacturer's 
procedure except that the hybridization time was extended 



to 120 hours. The arrays were dried by centrifugation at 
1,200 X g for 1 min. 

The arrays were scanned using the InnoScan 900AL 
scanner (Innopsys, Carbonne, France). Data were 
extracted from scanned images using the NimbleScan 2.5 
Software (Roche NimbleGen Inc.) that allows for auto- 
mated grid alignment, extraction and generation of data 
files. 

Microarray hybridization results analysis 

For the wheat NimbleGen 40k unigene microarray,the 
normalization of the MTP pool data was done using the 
methods developed by Rustenholz et al, [15,16]. Two 
thresholds were calculated:the 'mean + x x Standard 
Deviation' with the following coefficient for the plates, col- 
umns and rows:plate: 2.7, 2.6, 2.5, 2.4, 2.3, 2.2, 2.1 and 2; 
row: 2.8, 2.7, 2.6, 2.5, 2.4, 2.3 2.2 and 2.1; column: 3.0, 2.9, 
2.8, 2.7, 2.6, 2.5, 2.4, 2.3 and 2.2, andthe 't-test' method 
using the same thresholds as Rustenholz et al, [16]. All 
probes with a probe signal above these thresholds were 
considered positive. 

The normalization of the sorted IBL chromosome data 
was done using automated script developed with the R 
software [95]. A lowess correction was used to correct the 
dye biases. Afterwards the corrected intensity values were 
checked for each gene and the aberrant values deleted. 
Then, for each value, the median value of all genes was 
subtracted and divided by their standard deviation. Finally, 
the positive genes were identified with the 'mean + x x 
Standard Deviation' as described above. 

For the IBL ISBP NimbleGen array,the normalization 
was performed using automated scripts developed with 
the R software. The background intensity was estimated 
using the median of the intensities of the empty spots and 
subtracted fromthe intensity of each spot. After a log2 
transformation, linear-lowess normalization was per- 
formed. Data were then subtracted by the median of the 
total spot intensity and divided by the standard deviation. 
For each probe,two thresholdswere calculated:the mean + 
2 X the standard deviation and a Student's t-test at a P- 
value threshold of 0.05 were performed. All markers with 
at least 75% of their probes above this threshold and with 
a P-value below 0.05 were considered positive for a pool. 

Three-dimensional MTP pools and plate pools data 
deconvolution 

A home-made Perl script, called DSMP.pl was developed 
for the deconvolution of molecular marker screening data 
produced on the three-dimensionalBAC pools of the 
MTP. Essentially, the script checks if two overlapping 
BACs of the physical map explain all the positive pools 
and can alsodeconvolute three others types of results: two 
non-overlapping BACs from two different contigs with fin- 
gerprints matching at acut-off of le'^^, a tolerance of 12, 
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and a MTP addresses that explain all the true positive 
pools; two overlapping BACs with one fingerprint match- 
ing a third BAG, included in a contig other than the two 
overlapping BACs, at a cut-off of le'^^, a tolerance of 12, 
and a MTP addressing these three BACs that explain all 
the real positive pools;and a pair of overlapping BACs 
with one fingerprint matching a BAC in a second pair of 
overlapping BACs, included in a contig other than the two 
first overlapping BACs, at a cut-off of le'^^, a tolerance of 
12, and a MTP addressing these fourBACs that explain all 
the true positive pools. TheDSMP.pl script is available 
upon request from the corresponding author. 

The molecular marker screening data produced on the 
plate pools of the whole IBL BAC library were deconvo- 
luted with the Elephantsoftware and the default para- 
meters [33]. Briefly, Elephantpartitions the contigs into 
short sections by splitting the contig at each branching 
point and establishes a list of clones for each segment; 
for each marker, it combines the results from pool 
screening with the pool composition to establish a list 
of candidate clones harboring the markers; it compares 
the two lists and scores each segment; and finally, for 
each marker, if a unique segment had a score above the 
threshold of 13, it assigns the marker to the segment. 

Roche454 sequence information 

The Roche454 sequencing of the IBL sorted chromo- 
some was performed by Wicker et al, [28]. They pro- 
duced 2,177,887 reads and covered 834 Mb (1.6 X). All 
sequence information generated was deposited to the 
European Bioinformatics Institute short-read archive 
under the accession number [ERX009439]. 

Sequence analysis 

Sequences were analyzed using Basic Local Alignment 
Search Tool (BLAST) software [96]. BLASTN analyses 
of all the probes of the wheat NimbleGen 40k unigene 
microarray [15] against all the Roche454 reads of sorted 
chromosomelBL were performed to identify the IBL 
unigenes present on the NimbleGen microarray. The 
results were parsed to keep the best hit with at least 
98% of sequence identity on at least 57 bp. Every uni- 
gene with a hit meeting these criteria wasconsidered as 
aunigene originating from chromosome IBL. 

BLASTXanalyses of the 39,179 unigenes sequences 
represented on the wheat NimbleGen 40k microarray 
against the databases of all rice {Oryza sativa) [97], 
B. distachyon[98] and sorghum {Sorghum bicolour) pep- 
tides [99] were performed to identify orthologous genes 
to the genes mapped on wheat chromosome IBL. The 
results were parsed to keep the best hits with at least 
35% of sequence identity on at least 40 amino acids. 
Every unigene with a hit meeting these criteria was 



considered as orthologous of the rice, sorghum or B. 
distachyon gene identified. 

To eliminate redundancy in the unigene set, we used 
information from orthologous genes in rice and B. dis- 
tachyon. We also checked with the latest release of the 
wheat unigene build (version 59) for further redundancy 
compared to the build version 55 that was used to build 
the wheat NimbleGen 40K unigene microarray[15]. 
When unigenes were located on the same BAC(s), had 
the same orthologous genes in rice and/or in B.distach- 
yoHy and were grouped in the same cluster of the new 
build version 59, only one unigene was kept on the IBL 
physical map. 

TBLASTXanalyses of the coding region of 128 rice 
genes corresponding to 64 ancient duplicated genes 
between rice chromosome 1 and 5 as defined by Mura- 
tet al [41] were performed against all the sequence con- 
tigs from the IWGSC survey sequencing of all wheat 
chromosome. The results were parsed to keep the 
cumulated hits with at least 35% of sequence identity on 
at least 70% of the coding region of the rice genes. 

IBL GenomeZipper construction 

The GenomeZipper [36,37] of chromosome IBL was buil- 
tusing a comparative framework with reference grass gen- 
omes and 242 gene-based single nucleotide polymorphism 
markers from thosemapped by genotyping by sequencing 
on chromosome IBL [38]. To identify and position genie 
regions covered by the wheat IBL sequence contigs 
(198,968 contigs) produced by the IWGSC [10], repetitive 
elements were detected and maskedby comparingthe 
wheat IBL assembled sequences against the MIPS-REdat 
Poaceae v8.6.2 repeat reference library using Vmatch [100] 
and the following parameters: 70% identity cut-off, 100 bp 
minimal length, seed length 14, exdrop 5, and e-value 
0.001. These sequences were then sequence masked and 
not considered for the construction of the genome zipper. 

To identify syntenic conserved genes in the repeat-fil- 
tered wheat IBL sequence contigs, sequence comparisons 
(BLASTX, >75%/70% sequence identity, alignment length 
>30 amino acids) against three reference genomes of B. dis- 
tachyon (genome annotation vl.2,[101]), rice (rice RAP-DB 
genome build 4[97]) and sorghum (genome annotation 
vl.4, [99]) were performed. Syntenic regions corresponding 
to wheat IBL were calculated usinga sliding window 
approach (windows size 0.5 Mbp, shift size 0.1 Mbp) to 
define synteny by the density of homology matches 
between query and reference genome. 

The corresponding orthologous genes were anchored 
to the marker backbone via bi-directional blast hits. 
Genes without marker association were located accord- 
ing to their position in the corresponding reference gen- 
omes. The obtained IBL linear ordered gene map was 
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then expanded using IBL repeat filtered contigs, wheat 
ESTs (vl.l9) [39], and wheat hill-length cDNAs [102]. 

Gene ordering on chromosome IBL 

The ordering of genes along chromosome IBL was 
based first on the ordering of the physical contigs and 
second on the position of the genes within contigs. Each 
IBL gene was assigned to one or several overlapping 
BACs using the information from the 40k unigenes 
NimbleGen array.The position of each gene in each con- 
tig was estimated based on the average of the consensus 
band map coordinate of the BAC(s) containing the gene. 
When several genes were assigned to the same BAC(s), 
the order was based on the synteny information when 
available or randomly chosen if not. The position of the 
genes with a clear position relative to their neighbor or 
with synteny information was consideredto have high 
confidence. The position of the genes with unclear pla- 
cement with their neighbor genes or with discrepancy 
between synteny information and position in the contig 
was consideredto have low confidence. 

Comparison of the IBL virtual gene orders based on the 
physical map and the GenomeZipper 

To compare the two virtual gene orders, we identified 
the common genes. BLASTN [96] analyses of all the 
1,161 IBL unigenes against all the IBL Roche454 reads 
and ESTs of the GenomeZipper were performed. All 
genes with at least 90% of sequence identity on at least 
100 bp were considered as common genes. Moreover, 
we compared the orthologous genes identified in rice, 
sorghum and B. distachyon for the GenomeZipper and 
the 1,161 IBL unigenes andall genes with the same 
orthologous genes in one or more of the three cereals 
were considered as common genes.Figure 3D was drawn 
using Circos [103] to compare the order of the common 
genes between the virtual gene orders based on the phy- 
sical map andthe GenomeZipper. 

Data availability 

A genome browser of the physical map of the wheat chro- 
mosome IBL is available from the Unite de Recherche 
Genomique Info website [34]. All the NimbleGen array 
design and all microarray data have been deposited to 
ArrayExpress [104] under accession numbers [A-MEXP- 
23 14] for the wheat NimbleGen 40k unigene design, [A- 
MEXP-2312] for the IBL ISBP NimbleGen array design 
and [E-MTAB-1657] for the ISBP hybridization experi- 
ment and [E-MTAB- 1650] for the wheat NimbleGen 40k 
unigene hybridization experiment. The Roche454 
sequences of the IBL sorted chromosome areaccessible at 
the European Bioinformatics Institute short-read archive 
under the accession number [ERX009439]. 



Additional material 



Additional file 1: Information on the 5,489 markers assigned to the 
physical map of wheat chromosome IBL. Table listing the marker 
name, the marker type, the names of the deletion bin,the ID of the BAC 
clones andthe name of the physical contigsto which the marker is 
assigned,the position (in cM) of the markerson the Chinese Spring 
xRenanF2 genetic map,the order of the geneand the level of confidence 
in this order, the unigene length, the ID of the orthologous genes in 
rice, sorghum and Brachypodium and the gene islands number for the 
1,161 genes assigned to the physical map of the wheat chromosome 
IBL 

Additional file 2: Features of the 616 contigs of the final LTC 
physical map of the wheat chromosome IBL. Table listing the contig 
name, the number of clones, the contig size, the total number of 
markers, the number of genes, the name of the deletion bin to which 
the contig is assigned, the position on the IBL neighbor genetic map 
and the number of the ordered contigs. 

Additional file 3: GenomeZipper of wheat chromosome IBL. Table 
listing the marker name, the position on the genotyping by sequencing 
genetic map, the ID of the orthologous genes in rice, sorghum and 
Brachypodium, the ID of the full-length cDNA matching the marker or 
the orthologous genes, the wheat lllumina reads matching the marker or 
orthologous gene and the ID of the wheat ESTs matching the marker or 
orthologous gene. 
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